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Abstract — This  paper  proposes  a  joint  multitarget  (JoM)  esti¬ 
mator  for  the  joint  target  detection  and  tracking  (JoTT)  filter. 
An  efficient  choice  to  the  unknown  JoM  estimation  constant 
(i.e.,  hypervolume  around  target  state  estimate)  is  proposed  as 
a  Pareto-optimal  solution  to  a  multi-objective  nonlinear  convex 
optimization  problem.  The  multi-objective  function  is  formulated 
as  two  convex  objective  functions  in  conflict.  The  first  objective 
function  is  the  information  theoretic  part  of  the  problem  and 
aims  for  entropy  maximization,  while  the  second  one  arises  from 
the  constraint  in  the  definition  of  the  JoM  estimator  and  aims 
to  improve  the  accuracy  of  the  JoM  estimates.  The  Pareto- 
optimal  solution  is  obtained  using  the  weighted  sum  method, 
where  objective  weights  are  determined  as  linear  predictions  from 
autoregressive  models.  In  contrast  to  the  marginal  multitarget 
(MaM)  estimator,  the  “target-present”  decision  from  the  JoM 
estimator  depends  on  the  spatial  information  as  well  as  the 
cardinality  information  in  the  finite-set  statistics  (FISST)  density. 
The  simulation  results  demonstrate  that  the  JoM  estimator 
achieves  better  track  management  performance  in  terms  of  track 
confirmation  latency  and  track  maintenance  than  the  MaM 
estimator  for  different  values  of  detection  probability.  However, 
the  proposed  JoM  estimator  suffers  from  track  termination 
latency  more  than  the  MaM  estimator  since  the  localization 
performance  of  the  JoTT  filter  does  deteriorate  gradually  after 
target  termination. 

Index  Terms — Target  tracking,  JoM  estimator,  Bernoulli  RFS, 
JoTT  filter,  track  management. 

I.  Introduction 

Target  tracking  is  the  process  of  estimating  the  state  of  a 
dynamic  object  by  filtering  noisy  measurements  in  the  pres¬ 
ence  of  false  alarms  and  missed  detections.  The  whole  process 
can  be  divided  into  track  confirmation,  track  maintenance,  and 
track  termination  functions.  Hence,  it  is  necessary  to  verify 
the  existence  of  the  target  from  the  received  measurements. 
A  number  of  statistical  algorithms  have  been  proposed  for  the 
detection  and  tracking  of  single  (or  multiple)  target(s)  [1].  A 
recent  innovation  in  the  area  of  target  detection  and  tracking 
is  in  the  application  of  the  Random  Finite  Sets  (RFS)  using 
the  finite-set  statistics  (FISST)  [2],  [3], 

The  RFS  formalism  of  the  Bayesian  multitarget  filter  pro¬ 
vides  a  formal  mechanism  for  propagating  and  updating  FISST 
densities.  Using  the  Almost  Parallel  Worlds  Principle  (AP- 
WOP)  along  with  the  relationship  between  the  FISST  prob¬ 
ability  and  the  measure  theoretic  probability,  some  statistical 
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concepts  and  techniques  in  filtering  theory  and  information 
theory  can  be  established  for  the  RFS  formalism  [2],  [3]. 
However,  the  conventional  single  target  state  estimators  (e.g., 
the  maximum  a  posteriori  (MAP)  estimator  and  the  expected 
a  posteriori  (EAP)  estimator)  are  undefined  for  RFS  based 
multitarget  filters  [2],  [4],  Hence,  two  Bayesian  optimal  es¬ 
timators  were  proposed  to  obtain  the  multitarget  states  from 
FISST  densities.  The  first  multitarget  state  estimator  is  called 
the  marginal  multitarget  (MaM)  estimator.  This  estimator  only 
considers  the  cardinality  information  (i.e.,  the  number  of 
elements  of  a  given  RFS)  in  FISST  densities.  The  second  mul¬ 
titarget  state  estimator  is  called  the  joint  multitarget  (JoM)  esti¬ 
mator.  This  estimator,  as  its  name  suggests,  considers  both  the 
cardinality  and  spatial  information  related  to  multitarget  states 
in  FISST  densities.  These  two  estimators  are  Bayesian  optimal, 
i.e.,  they  minimize  their  Bayes  risk  functions.  Recently,  the 
minimum  mean  optimal  sub-pattern  assignment  (MMOSPA) 
estimator  in  [5]  was  generalized  for  the  probability  hypothesis 
density  (PHD)  filter  [6],  Thus,  a  theoretical  basis  also  has  been 
established  for  the  commonly  used  k-means  clustering  method. 

The  multi-Bernoulli  assumption  on  the  RFS  of  targets 
represents  each  target  independently  by  a  parameter  pair  {q,  /} 
[8].  That  is,  for  each  target  an  independent  Bernoulli  RFS 
provides  a  unified  statistical  representation  of  target  existence 
via  the  probability  q  and  target  states  via  the  spatial  probability 
density  /  (x).  Using  the  multi-Bernoulli  RFS  representation, 
tractable  approximations  of  the  multi  target  Bayes  filter,  gen¬ 
erally  known  as  the  multi-target  multi-Bernoulli  (MeMBer) 
filters,  were  developed  [2],  [9],  [10].  In  addition,  the  Bernoulli 
RFS  formalism  was  used  in  the  development  of  an  exact  solu¬ 
tion  to  the  single-target  tracking  problem.  First,  the  integrated 
probabilistic  data  association  (IP DA)  filter  [11]  was  formulated 
as  an  RFS  based  Bayes  filter  [12],  Then,  this  RFS  formula¬ 
tion  was  extended  by  making  use  of  a  target  birth  model, 
state -dependent  detection  probability  and  arbitrary  false  alarm 
process  in  its  framework.  Thus,  the  joint  target  detection  and 
tracking  (JoTT)  filter  (also  known  as  the  Bernoulli  filter) 
was  developed  with  the  objective  of  estimating  the  target 
existence  probability  along  with  its  state(s)  [2],  [13],  For  more 
detailed  information  regarding  the  theory,  implementation  and 
applications  of  Bernoulli  filters,  interested  readers  are  referred 
to  [8], 

The  performance  of  tracking  algorithms  and  state  estimators 
can  be  evaluated  by  metrics  defined  in  terms  of  cardinality, 
time,  and  accuracy  [14],  [15].  The  performance  metrics  should 
be  determined  according  to  which  attributes  of  the  tracking 
algorithm  or  the  state  estimator  are  selected  to  be  monitored. 
For  example,  the  mean  OSPA  (MOSPA)  metric  is  appropriate 
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to  reduce  jitters  and  track  coalescence  [5],  [7],  In  addition, 
they  should  be  consistent  with  the  criteria  that  the  tracking 
algorithm  or  state  estimator  is  developed  to  optimize  [5],  [16], 
Based  on  these  facts,  it  is  important  to  point  out  that  the 
estimated  states  from  the  JoTT  filter  using  the  JoM  estimator 
is  identical  to  that  using  the  MaM  estimator  if  “target-present” 
decision  is  confirmed  by  these  two  estimators.  Therefore,  the 
performance  metric(s)  should  be  selected  so  as  to  monitor 
the  cardinality  and  time  attributes  of  these  two  estimators  re¬ 
garding  track  confirmation,  track  maintenance  quality  after  the 
target  birth,  and  track  termination.  There  are  numerous  metrics 
defined  in  terms  of  cardinality  and  time.  Nevertheless,  the 
OSPA  metric  is  defined  as  a  rigorous  and  robust  performance 
measure  for  the  (multi)target  Bayes  filters  [17],  [18], 

Even  though  the  MaM  estimator  is  used  in  MeMBer  type 
filters,  the  exact  use  of  the  JoM  estimator  with  these  filters 
has  not  been  studied  so  far.  In  this  paper,  we  propose  a  JoM 
estimator  to  obtain  the  estimate  of  the  target  RFS  from  the 
JoTT  filter.  The  proper  choice  to  the  unknwon  JoM  estimation 
constant  (i.e.,  hypervolume  around  target  state  estimate)  is 
obtained  as  a  Pareto-optimal  solution  to  a  multi-objective 
nonlinear  convex  optimization  problem.  The  multi-objective 
function  is  formulated  as  two  convex  objective  functions  in 
conflict.  The  first  objective  function  is  the  information  theo¬ 
retic  part  of  the  problem  and  aims  for  entropy  maximization, 
while  the  second  one  arises  from  the  constraint  in  the  definition 
of  the  JoM  estimator  and  aims  to  improve  the  accuracy  of  the 
JoM  estimates.  The  Pareto-optimal  solution  is  obtained  using 
the  weighted  sum  method  [19]— [22],  This  method  aggregates 
two  or  more  objective  functions  into  a  single  objective  function 
using  weights  selected  according  to  their  relative  importance. 
Then,  the  resulting  single-objective  optimization  problem  can 
be  solved  using  any  standard  optimization  technique  [19],  [20]. 

This  paper  is  organized  as  follows:  Section  II  provides  the 
necessary  background  on  information  theory  and  multitarget 
state  estimation.  In  Section  III,  the  Bayesian  optimal  multitar¬ 
get  estimators  (i.e.,  MaM  and  JoM  estimators)  are  presented 
along  with  their  evaluations  for  estimation  of  multitarget  states. 
The  proper  choice  to  the  JoM  estimation  constant  is  formulated 
in  Section  IV.  For  its  Pareto-optimal  solution,  linear  predic¬ 
tions  of  objective  weights  are  proposed  in  Section  V.  The 
implementation  of  the  JoM  estimator  for  the  JoTT  filter  under 
Gaussian  assumptions  is  presented  in  Section  VI.  Simulation 
results  are  shown  in  Section  VII.  Finally,  conclusions  and 
future  research  directions  are  given  in  Section  VIII. 

II.  Background 
A.  Concepts  in  Information  Theory 

In  the  following,  we  introduce  some  of  the  basic  concepts  of 
information  theory.  For  the  sake  of  completeness  and  clarity, 
we  also  summarize  how  each  concept  is  utilized  later. 

Entropy.  A  random  variable  is  statistically  characterized  by 
its  probability  density  function  (pdf).  In  traditional  statistics, 
variance  of  a  random  variable  is  used  to  measure  its  uncer¬ 
tainty.  However,  in  the  information  theoretic  sense,  entropy  is 
a  measure  of  the  amount  of  uncertainty  in  a  random  variable 
[23],  For  a  discrete  random  variable  x  characterized  by  the 


probability  mass  function  (pmf)  p  ( x )  over  its  sample  space 
X,  the  entropy  is  computed  as 

H  (.P)  = -^P(x)\°g{p{x)),  (1) 

x£X 

where  —  log(p(x))  is  called  the  self-information  obtained  by 
the  observation  of  x.  For  the  continuity  of  entropy,  0  log  (0)  = 
0,  and  thus  zero  probability  does  not  change  the  uncertainty 
in  x. 

Entropy  is  a  nonnegative  measure,  i.e.,  H  (x)  >  0  with  the 
properties  that  H  (x)  is  maximized  if  p  ( x )  is  uniform,  and 
H  ( x )  =  0  if  there  is  no  uncertainty  in  x,  i.e.,  p  ( x )  =  0  or 
1  [23],  Hence,  larger  entropy  means  that  less  information  is 
available  for  the  realization  of  a  random  variable  through  its 
pmf  [24], 

Differential  Entropy:  For  continuous  random  variables,  the 
information  theoretic  uncertainty  analogous  to  the  entropy  is 
called  the  differential  entropy,  and  is  defined  as 


H  (/)  =  —  f  f  {x)  log  (/  (x))  dx,  (2) 

J  S 

where  S  is  the  support  set  of  the  continuous  pdf  /  (x).  Unlike 
the  entropy,  the  differential  entropy  has  values  in  the  range 
[—oo,  oo].  Therefore,  its  standalone  value  cannot  be  interpreted 
as  the  amount  of  uncertainty  on  a  continuous  time  random 
variable.  Besides,  it  makes  sense  within  the  definition  of  the 
following  concepts. 

Entropy  and  differential  entropy  will  be  utilized  to  analyze 
uncertainties  related  to  the  cardinality  and  spatial  information 
in  a  FISST  density,  respectively.  Thus,  we  can  evaluate  how 
appropriate  the  MaM  and  JoM  estimators  are  for  estimation 
of  the  multitarget  states. 

Asymptotic  Equipartition  Property:  In  information  theory, 
the  weak  law  of  large  numbers  corresponds  to  asymptotic 
equipartition  property  (AEP)  [23],  That  is,  given  that  xi, ...,  xn 
are  independent  and  identically  distributed  (i.i.d.)  random 
samples  from  /  (x),  then  the  normalized  self-information  of 
this  sequence  weakly  converges  to  the  (differential)  entropy  of 
/  (x)  with  a  small  positive  tolerance,  i.e.,  r  >  0  if  n  is  large 
enough  to  satisfy  [23] 


Pr 


log/(xi,...rx„)  H  (/) 


>1  —  5, 


(3) 


where  5  — »  0  as  n  — >  oo  (proof  is  given  by  Chebyshev’s 
inequality).  The  collection  of  these  sequences  forms  typical 
set  A”.  Most  of  the  total  probability  is  contained  in  this  set, 
i.e.,  Pr  (A")  >  1  —  r  and  is  almost  uniformly  distributed  [23] 
as 

2~n(H(f)+r)  <  Pr  (x1;  ...,  x„)  <  2-nW/)-r).  (4) 


Hence,  if  any  statistical  conclusion  is  drawn  for  a  typical  set, 
it  would  be  true  in  general  with  high  probability  [23].  In 
addition,  the  volume  of  typical  set  is  almost  given  by  [23], 
[25] 

Vol(A^)  m  (5) 

Then,  the  larger\smaller  the  differential  entropy  is,  the  more 
/  (x)  disperses\concentrates  over  its  support  set  S.  Note  that 
the  typical  set  has  the  smallest  volume,  compared  to  all 
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possible  sets  that  contain  most  of  the  total  probability  [23], 
[24], 

Typical  set  of  a  standard  Gaussian  density  will  lead  us  to 
define  another  important  set,  where  the  sequences  of  mostly 
likely  state  estimates  exist.  Thus,  our  aim  would  be  the  entropy 
maximization  by  defining  a  uniform  density  over  this  set. 

Quantization :  The  relationship  between  the  entropy  and  the 
differential  entropy  is  established  by  quantization.  To  see  this, 
assume  that  the  range  of  a  continuous  random  variable  x  is 
divided  into  bins  of  A  where  f  (x)  is  continuous.  Then,  the 
entropy  of  the  quantized  random  variable  is  given  by 

OO 

H  (?)  =  -^2Pil°S(Pi) , 

—  OO 
OO 

=  -^2f(xi)A\og{f(xi)A),  (6) 

—  OO 
OO 

=  -J2f  ( Xi ) A  los  (/  Og))  -  loS  (A)  > 

—  OO 

where  the  first  term  approaches  —  fs  f  (x)  log  (/  (x))  as 
A  — >  0.  Thus,  for  n  bit  quantization  of  a  continuous  random 
variable,  i.e.,  A  =  2-n,  the  entropy  increases  with  n  as 

H(p)  =  H  (/)  +  n. 


This  means  that  in  order  to  represent  an  n-bit  quantized 
information  from  x  ~  /  (x)  the  average  number  of  bits 
required  is  H  (/)  +  n  [23], 

This  concept  will  be  utilized  to  analyze  the  entropy  of  a 
FISST  density  when  the  corresponding  RFS  is  quantized.  This 
analysis  demonstrates  an  important  fact  about  the  selection  of 
the  JoM  estimation  constant. 

Kullback-Leibler  Divergence  (Relative  Entropy)'.  Kullback- 
Leibler  (KL)  divergence  is  a  statistical  measure  of  the  differ¬ 
ence  of  a  model  or  a  theory  based  pdf  /  (a:)  from  a  true  or 
a  reference  pdf  ft  (x)  on  the  same  support  set.  If  ft  (x)  is 
absolutely  continuous  with  respect  to  /  (a:)  or  +oo  otherwise, 
KL  divergence  of  f  (x)  from  ft  (x)  is  defined  as 


K  (ft  II/)  =  Jft  ( x )  log  (JjJJJjdx, 

=  Jft  ( X )  log  ( ft  (x))dx -Jft  (ar)  log  (/  (x))dx, 


=  H(ft\\f)-H(ft), 


(7) 

where  the  first  term  measures  the  uncertainty  introduced  by 
using  a  model  or  theory  based  /  (a;)  instead  of  the  true 
or  reference  ft  (x)  while  the  second  term  is  the  differential 
entropy  of  ft( x).  Hence,  the  more  f  (x)  resembles  ft(x), 
the  less  is  the  information  lost  due  to  using  f  (x).  That  is, 
K  (f  \\  ft)  >  0  gets  smaller  values  with  equality  if  and  only 

if  /  (x)  =  ft  ( x ). 

KL  divergence  is  an  important  concept  used  in  the  develop¬ 
ment  of  other  consistent  concepts  in  information  theory.  For 
example,  mutual  information  is  a  special  case  of  KL  diver¬ 
gence  [23],  and  entropy  maximization  is  in  general  formulated 
as  the  minimization  of  KL  divergence  instead  of  Shannon’s 
entropy  given  by  (1)  and  (2)  [24],  [26]. 


With  the  help  of  other  relevant  concepts  KL  divergence  will 
be  utilized  to  define  the  information  theoretic  part  of  the  multi¬ 
objective  optimization  problem. 


B.  Multitarget  State  Estimation 

In  the  following,  we  exemplify  the  problems  of  the  MAP 
and  EAP  estimators  when  they  are  generalized  for  estima¬ 
tion  of  multitarget  states.  Then,  we  define  the  global  MAP 
estimators,  i.e.,  the  GMAP-1  and  GMAP-II  estimators,  which 
were  introduced  in  [27]  and  also  known  as  the  MaM  and  JoM 
estimators  in  [2],  [4],  respectively. 

Consider  the  scenario  in  [2],  [3],  where  a  Bernoulli  target 
moves  in  the  one  dimensional  interval  [0, 2]  with  units  given  in 
meters.  In  addition,  suppose  that  the  target  existence  probabil¬ 
ity  is  set  to  0.5  and  if  the  Bernoulli  target  does  exist,  its  spatial 
probability  density  is  uniform  over  [0,  2].  That  is,  suppose  that 
the  FISST  density  in  units  of  m“lxl  is 


f(X) 


0.5,  if  A  =  0 

0.25  m-1,  if 
0,  otherwise 


X  =  {x} 

0  <  x  <  2 


First,  we  try  to  obtain  the  MAP  estimate  using  XMAP  = 
arg  sup  /  (A).  However,  the  MAP  estimator  is  undefined  since 

A' 

/  (0)  =  0.5  cannot  be  compared  with  f  ({x})  =  0.25  m-1. 
This  problem  would  be  eliminated  by  converting  /  (A)  into  a 
unitless  quantity  by  multiplying  it  with  mlxL  Thus,  we  obtain 
the  MAP  estimate  as  XMAP  =  0.  However,  this  conversion 
results  in  a  paradox.  That  is,  if  the  Bernoulli  target  moved 
in  the  same  interval  with  units  given  in  kilometer  instead  of 
meter,  this  would  result  in  f  ({x})  =  250  m-1.  Thus,  we 
would  obtain  the  MAP  estimate  as  XMAP  =  {a’}  after  the 
conversion.  That  is,  the  change  in  unit  of  measurements  from 
m  to  km  also  changes  the  MAP  estimate  [2],  [3], 

Now,  using  the  set  integral  we  try  to  obtain  the  EAP  estimate 
from 

XEAP  =  J  Xf  (A)  SX, 

2 

=  0/(0)  +  J  xf{{x})dx, 

o 

=  0.5  (0  +  1  m) . 


As  indicated  in  [2],  [3],  the  EAP  estimator  faces  additional 
problems  arising  from  ill-defined  arithmetic  operations  on 
sets.  Therefore,  like  the  MAP  estimator,  the  EAP  estimator 
is  undefined  when  generalized  for  estimation  of  multitarget 
states. 

The  GMAP-I  and  GMAP-II  are  Bayesian  estimators,  which 
are  defined  according  to  the  minimization  of  the  following  cost 
functions  [27] 


C0(X,Y) 


0,  if  |A|  =  \Y\ 
1.  if  m  ±  \Y\ 


(8) 


and 

C  (A,  Y)  =  C0  (A,  Y)  +  Cx  (A,  Y) ,  (9) 
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respectively.  The  second  cost  function  in  (9)  takes  into  account  measure  theoretic  probability,  the  differential  entropy  of  an 
the  spatial  information  in  a  FISST  density,  i.e.,  RFS  X  is  given  by  [28],  [29] 


s  =  r, 

o,  if  <  , 

{  (xi, ...,  xs)  =  (yfo,  •••,  ypr)  £  K 
1,  otherwise 

where  the  hybrid  RFSs  are  defined  as  X  =  {£i,  •••, £s} 
and  Y  =  {£i,...,£r}  with  their  identities  (£i,...,£a)  and 
i.e.,  &  =  fori  =  l,...,s  and  Q  = 

for  i  =  l,...,r.  The  RFSs  consisting  of  Wx,y  £  Rn  are 
surrounded  by  a  closed  ball  K  in  (R'")'  and  are  associated 
through  a  one-to-one  function  given  by  /3  :  (£i,...,£a)  — > 
(ipi, (pr).  Thus,  the  cost  function  in  (8)  just  weights  the  car¬ 
dinality  discrepancy,  whereas  the  cost  function  in  (9)  weights 
both  the  cardinality  and  spatial  discrepancies.  These  properties 
of  the  GMAP-I  and  GMAP-II  estimators  will  help  us  in 
evaluating  the  corresponding  MaM  and  JoM  estimators  for 
estimation  of  multi  target  states. 


Ci(X,Y)  = 


III.  Multitarget  Bayes  Estimators 


For  RFSs  with  different  cardinalities,  their  FISST  den¬ 
sities  have  incommensurable  scales  (i.e.,  different  physical 
dimensions).  Furthermore,  addition  and  subtraction  operations 
on  RFSs  are  not  defined  properly.  Therefore,  the  multitarget 
analogues  of  the  MAP  and  EAP  estimators  are  undefined  [2]- 
[4],  [27].  Nevertheless,  two  MAP  like  multitarget  estimators 
were  proposed  for  FISST  densities.  In  the  following,  we 
show  how  multitarget  states  are  obtained  using  these  Bayes 
estimators.  In  addition,  we  evaluate  how  appropriate  they 
are  for  this  purpose  based  on  the  results  obtained  from  the 
analysis  of  uncertainties  related  to  the  cardinality  and  spatial 
information  in  a  FISST  density. 

Marginal  Multitarget  (MaM)  Estimator :  The  MaM  estimate 
of  an  RFS  is  computed  in  a  two-step  procedure:  first,  the  MAP 
estimate  of  the  cardinality  is  determined: 

nMAP  ±  arg  sup  pm  (n) ,  (10) 


where  |X|  denotes  the  cardinality  variable  for  the  RFS  X  and 
is  characterized  by  its  probability  mass  function.  That  is,  the 
cardinality  distribution  of  the  RFS  X,  given  that  Ztk  '>  is  the 
RFS  of  measurements  at  time  k,  is 


P\x\(n)  =  ^  Jfk\k  Z(fc))dxi...dxn.  (11) 


Then,  the  MAP  estimate  of  the  multitarget  states  is  deter¬ 
mined  from  the  corresponding  FISST  posterior  density  for  the 
given  cardinality  estimate  n  =  hMAP  as 


=  arg 


sup  fk\k 


({*!) 


}  Z(k)^j 


(12) 

The  MaM  estimator  is  Bayesian  optimal  [2],  [4],  [27]. 
However,  it  does  not  utilize  all  the  information  contained  in  the 
multitarget  posterior  density.  Hence,  it  would  be  statistically 
unreliable  when  the  target  number  is  related  to  the  spatial 
information  in  the  FISST  posterior  density  [2],  [4].  That  is, 
using  the  relationship  between  the  FISST  probability  and 


H  (. fx )  =  l0§  (vlXlf  C X ))  SX, 

oo  1 

=  ni  f  ({xi,-,x„})x 

n= 0  J 

log  ( vnf  ({xi, ...,  x„}))  dxi...dxn, 

(13) 

where  iHyI  is  the  unit  of  the  FISST  density  /  (X).  Note  that 
the  dependence  of  the  FISST  posterior  density  on  the  RFS 
Z(k)  is  dropped  here  for  conciseness. 

Substituting  f  ({x i,  =  n\p\x\  (n)  f  (xlt xn) 

into  (13)  yields 

OO  r. 

H  (fx)  =  -^2p\x\  (n)  /  f(x  i,...,xn)x 

n= 0  ^ 

log  {n\vnp\x\  (n)  f  (xi, ...,  xn))  dxi...dxn, 

(14) 

and,  after  some  algebraic  manipulations,  the  differential  en¬ 
tropy  may  be  rewritten  as  the  sum  of  the  three  terms,  i.e.. 


H  (fx)  = 

OO  r. 

-^Zpl-Yl  los(PlYl  (n))  f  {xi,...,xn)dx1...dxn+ 

^ — n  J 


n— 0 
oo 


~  Y.P\x\  jn)  f{xi,...,xn)\og  (vnf(x1,...,xn))dx1...dxn+ 

n—0  ' 

oo 

~  X!pl-Yl  log  in')  f  (xi,  ...,xn)  dx!...dxn, 

n—0  ' 

(15) 

where  the  first  term  is  the  entropy  of  the  cardinality  distribu¬ 
tion: 


OO 

H  (P)  =  J2p\x\  (n) log  ( P\x\  (n) 

n—0 
oo 

=  -  p\x\  (n) log  (p\- x\  (n))  > 

n—0 

and  the  second  term  is  the  average  differential  entropy  of  the 
joint  pdf  of  xi,  ...,xn  over  p\x\  (n): 

OO 

E[H  (fx,n)}  =  Y,P\x\^H  (fX,n)  ■ 

n= 0 

The  probability  assigned  to  the  FISST  density  with  cardinality 
n,  i.e.,  fx.n  =  f  ({xi, ...,  xn}),  is  uniformly  distributed 
among  joint  pdfs  fx.n  =  /  (xi, ...,  xn)  of  n\  possible  vectors 
for  all  permutations  of  {xi, xn},  i.e.,  fx^n  are  symmetric 
joint  pdfs  of  (xCTi, ...,  xan),  where  a  indicates  the  permutation 
on  the  numbers  {l,...,n}  [2],  [29].  Hence,  the  third  term 
indicates  the  information  uncertainty  due  to  change  in  the 
representation  from  RFSs,  i.e.,  {xi,...,x„},  to  vectors  of 
indistinguishable  points,  i.e.,  (xi,...,xn)  [28],  [29]: 


) Jf  (xi, ...,  xn)  dxi...dxn, 


E  [log  (n!)]  (n)  log  (n!) , 

71=0 
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The  MaM  estimator’s  cost  function  only  penalizes  the 
cardinality  discrepancy  between  the  true  RFS  and  its  estimate 
[21].  Therefore,  the  MaM  estimator  determines  multi  target 
states  without  considering  the  uncertainty  represented  by  the 
second  and  the  third  terms  in  the  FISST  densities. 

Joint  Multitarget  (JoM)  Estimator.  In  contrast  to  the  MaM 
estimator,  the  JoM  estimator  determines  the  target  number 
and  multitarget  states  simultaneously  from  the  FISST  posterior 
density  [2]  as 

XJoM  =  arg  sup  fk\k  [X  Z ^  (16) 

where  the  parameter  e  denotes  a  small  constant  (hereinafter 
called  as  the  JoM  estimation  constant)  and  satisfies  that 
f  ({x i,  ...,xn})sn  <  1  for  all  integers  n>  0.  However,  there 
is  a  trade-off  in  the  selection  of  e.  That  is,  smaller  values 
of  e  yield  better  accuracy  in  multitarget  state  estimates,  but 
with  slower  convergence  to  the  true  multitarget  states  [2],  [4], 
In  Appendix  A,  information  theoretic  analysis  demonstrates 
that  the  uncertainty  in  multitarget  state  estimates  cannot  be 
improved  by  selecting  too  small  values  for  e. 

Alternatively,  the  JoM  estimator  can  be  performed  in  a  two- 
step  procedure  [2],  First,  for  integer  values  n  >  0  the  MAP 
estimates  of  the  RFSs  are  computed  from  the  corresponding 
posterior  FISST  densities: 

Xn  =  arg  sup  /  ({xi,...,xn}  Z(fc)  V  (17) 

Xl,...,Xn  '  ' 

Then,  using  Xn  for  each  n,  the  JoM  estimate  is  determined 
as  XJoM  =  Xn,  where  n  denotes  the  solution  to  the  following 
maximization  problem: 

r  \  en 

n  =  arg  sup/  ({fr,  Z(k)  )  .  (18) 

Like  the  MaM  estimator,  the  JoM  estimator  is  Bayesian 
optimal  [2],  [4],  [27].  However,  it  is  naturally  more  appropriate 
for  the  estimation  of  multitarget  states  since  its  cost  function 
penalizes  both  discrepancies  in  cardinality  and  multitarget 
states  [27].  In  addition,  it  is  known  that  the  JoM  estimator 
is  statistically  convergent  [2],  [4]. 

IV.  Optimization  of  the  JoM  Estimation  Constant 

The  differential  entropy  of  a  pdf  is  roughly  represented  by 
a  uniform  density  over  its  typical  set  [23],  [25].  However, 
typical  sets  do  not  include  the  sequences  of  all  the  most  (least) 
probable  state  estimates  [23],  [25].  For  example.  Fig.  1  shows 
the  cross-section  of  the  typical  set  of  a  standard  Gaussian 
density  around  a  hypersphere  centered  at  the  origin  of  Wl[r 
[25],  [30],  It  can  be  seen  that  the  typical  set  is  represented  by 
a  thin  shell  bounded  by  two  convex  sets  (see  Appendix  B). 
Instead,  for  log-concave  pdfs  (e.g.,  a  Gaussian  pdf)  superlevel 
sets  can  be  defined  so  as  to  include  the  sequences  of  most 
likely  state  estimates  [30],  [31]: 

Sx  =  {x  G  Kria:|/(ii,...,iri)  >  A},  (19) 

where  xi, ....  xn  are  i.i.d.  samples  drawn  from  the  log-concave 
pdf  /  ( x ),  and  A  is  the  supremum  value  of  the  uniform 
probability  on  the  typical  set  for  a  small  positive  constant 


r,  i.e.,  A  =  e~n^H^~T^  [23],  where  H  (/)  is  in  nats.  In 
particular,  if  x  is  Gaussian-distributed  with  mean  ft  and  co- 
variance  matrix  P  in  R™*,  i.e.,  x  ~  N  (/z,  P ),  then  substituting 
H  {f)  =  0.5  log  ((27re)na:  \P\)  [23]  for  A  yields 

A=  ({2n)n*  |P|)“"/2e-n(^-r), 

and  the  joint  probability  distribution  of  i.i.d.  samples  are  given 
by 

n 

f(x X„)  =  /(*»), 


=  f(x)r 


where  /  (x)  =  ((2n)nx  \P\T^\ 

Thus,  the  superlevel  set  given  by  (19)  can  be  alternatively 
defined  as 


-y>«- lYTP  1{%i  -H)  <nx~  2t  >  . 


(20) 

In  general,  this  bounded  and  closed  set  includes  the  sequences 
of  most  likely  random  samples  drawn  from  /  (x).  However, 
our  aim  is  to  define  a  confined  set  that  exclusively  consists 
of  good  state  estimates  from  the  JoM  estimator.  To  this  end, 
the  superlevel  set  in  (20),  when  evaluated  at  n  =  1,  gives  the 
least  upper  bound  for  this  special  subset  as 

=  |  x  €  R™x  |  (x  —  n)T  P-1  (x  —  n)  <  nx  —  2t|  , 

(21) 

where  0  <  2r  <  nx.  This  means  that  S\  ’  is  a  hyperellipsoid 
(i.e.,  a  convex  set)  with  the  centroid  at  /r  in  the  region 
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surrounded  by  the  inflection  points  of  the  Gaussian  density 

/  (x). 

The  entropy  maximization  helps  ignore  spurious  details  like 
tail  probabilities  and  side-lobes  for  which  samples  from  these 
parts  can  be  hardly  ever  observed  [32],  Over  bounded  and 
closed  sets,  the  entropy  maximization  is  achieved  by  uniform 
densities  [23],  Then,  the  KL  divergence  of  /  (x)  from  the 
uniform  density  defined  on  S^\  i.e.,  u  (x)  =  e:/1  is  given  by 

K(u  ll/)  =  /»Wlog(^M)*,  (22) 

=  H  («ll/)-log(£A), 

where  log  (eA)  is  the  differential  entropy  of  u  (x)  =  eT1  ,  i.e., 

H  ( u )  =  log  (eA),  and 

H  (u  ||/)  =  -  log  (/  (x))  +  F )TP~\x  -  n)dx, 

2£A  Jex 

<  -  log  (/  (x))  +  ^(nx-  2 r) , 

where  the  last  inequality  follows  from  (21).  Thus,  the  KL 
divergence  in  (22)  can  be  rewritten  as 

K  (u  ||/)  <  -  log  (/  (x)  ex)  +  1  (nx  -  2 r) ,  (23) 

where  the  first  term  on  the  right  hand  side  is  the  approximated 
KL  divergence  of  /  (x)  from  u  ( x )  =  £/ 1  when  eA  takes  so 
small  values,  i.e.,  nx  —  2r  — >■  0.  Note  that  the  sum  on  the  right 
hand  side  of  (23)  is  always  nonnegative  since  I\  (u  \\f)  >  0 
on  eA. 

The  volume  of  the  hyperellipsoid  .S’^1 1  can  be  expressed  in 
terms  of  r  as  follows  [33]: 

ex  =  C{nx)  |P|17V*/2,  (24) 

where  r  =  nx  —  2r  is  the  critical  value  for  the  total  probability 
of  /  (x)  in  the  hyperellipsoid,  and  C  (nx)  is  the  volume  of  the 
hypersphere  with  the  unit  radius  in  Rnx . 

After  substituting  for  eA  into  (23),  the  problem  at  hand  (i.e., 
determining  the  optimum  volume  of  the  hyperellipsoid)  can  be 
formulated  as  a  nonlinear  convex  optimization  problem  that 
determines  the  optimum  value  of  r  for  the  least  upper  bound 
of  the  KL  divergence.  That  is, 

minimize  foJ  (r)  =  -log  (/  (x)  £X)  +  \{jix~  2 r) , 
subject  to  g-\  (r)  =  — r  <  0, 

52  (r)  =  -(nx-  2 t)  +  7min  <  0, 

(25) 

where  7min  is  a  small  constant  determined  according  to  the 
chi-square  table,  considering  the  degree  of  freedom  (i.e.,  nx) 
and  the  probability  of  the  confidence  level  indicating  the 
smallest  hyperellipsoid,  e.g.,  Pr  (( nx  —  2 r)  >  7mi„)  >  95%. 

The  convex  optimization  problem  in  (25)  is  solely  for¬ 
mulated  in  terms  of  information  theoretic  sense.  In  other 
words,  the  objective  function  /0,/(r)  in  (25)  is  minimized 
as  nx  —  2t  — >  nx  (see  Appendix  C  for  proof).  Thus,  the 
computation  of  the  least  upper-bound  on  the  KL  divergence 


through  the  optimization  problem  in  (25)  corresponds  to  the 
minimization  of  information  gain  in  magnitude  measured  by 

K  (u\\f)<H(f)- log(ex). 


In  the  JoM  estimator,  the  selected  hyperellipsoid  surround¬ 
ing  the  estimated  states  of  targets  should  satisfy 


/  f  ({xi,...,xn})dx1...dxn=  /  /  (xi,...,xn)dxi...dxn, 

Jel  Jel 

=  f(x  i,...,xn)e", 

(26) 

where  the  first  expression  follows  from  f  ({x\, xn})  = 
n\  f  (x i,...,xn)  and  implies  that  the  volume  of  the  hyperel¬ 
lipsoid  e\  for  each  target  should  be  so  small  that  only  one 
permutation  of  the  RFS  is  possible  in  the  product  space  e”, 
i.e.,  {xi, xn}  =  (xi,...xn)  [27].  However,  as  indicated  in 
[2],  setting  e”  to  extremely  small  values  would  be  impractical 
without  considering  the  information  provided  by  /(x).  In 
other  words,  u  (x)  would  be  more  informative  than  /  (x) 
as  nx  —  2t  — ^ >  0.  However,  this  contradicts  the  information 
theoretic  part  of  the  optimization  problem  in  (25),  which  aims 
for  entropy  maximization  by  minimizing  information  gain 
obtained  using  u  (x)  instead  of  /  (x). 

In  contrast  to  single-objective  optimization,  there  is  usually 
no  unique  solution  that  simultaneously  achieves  the  optimiza¬ 
tion  of  more  than  one  objective  function.  Instead,  in  multi¬ 
objective  optimization  problems,  Pareto-optimal  solutions  can 
be  computed  according  to  the  relative  importance  of  individual 
objective  functions  [19],  [20].  For  a  vector  of  conflicting 
objective  functions  given  by  F(x)  =  [/i(x), ...,  /at(x)]  a 
solution  x*  is  said  to  be  Pareto  optimal  if  there  does  not  exist 
another  solution  that  dominates  it  [19].  That  is,  given  that  T 
is  the  feasible  design  space,  there  is  no  another  point,  x  £  T 
satisfying  F(x)  <  F(x*)  and  fi(x)  <  fi(x*)  for  at  least 
one  objective  function.  There  are  multiple  methods  for  multi¬ 
objective  optimization  problems.  However,  the  conversion  of 
the  multi-objective  problem  into  a  single-objective  problem  is 
the  standard  way  of  solving  [19],  [20]. 

To  determine  the  optimum  value  of  r,  two  objective  func¬ 
tions  faj  (t)  and  f0  j  (t),  which  quantify  entropy  maximiza¬ 
tion  and  the  accuracy  of  the  JoM  estimator,  respectively,  are 
in  conflict  with  one  another.  An  optimization  problem  with 
a  single  convex  objective  function  can  be  defined  by  aggre¬ 
gating  them  with  appropriately  selected  weights.  However,  a 
consistent  Pareto-optimal  solution  to  this  optimization  prob¬ 
lem  requires  the  normalization  of  these  conflicting  objective 
functions  in  different  magnitudes  [20],  [21].  To  this  end, 
their  extreme  values  are  calculated  at  the  vertex  points  of 
the  Pareto-optimal  set  [20].  Specifically,  for  the  problem  at 
hand,  first  set  r  =  0  to  obtain  the  minimum  of  fQ  j  (r),  i.e., 
pMm  wjjjje  setting  /0  j  (r)  to  its  maximum  value,  i.e.,  F^fx. 
Then,  set  r  =  0.5  (nx  —  7min)  to  obtain  F^!rax  and  F^jn  for 
foj  (r)  and  f„tj  (r),  respectively.  Finally,  the  following  robust 
normalization  is  performed  for  these  conflicting  objective 
functions  [20],  [21]: 


fTrans 

Jo,i  (T) 


/o,£  (r)  -  FaM™ 

zpMax  TpMin 

ro,i  ro,i 


V?  g  {I,  J} . 
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Thus,  an  optimization  problem  with  a  single  convex  objective 
function  can  be  obtained  as  follows: 

minimize  fm  (r)  =  (r)  +  (r) , 

subject  to  gi  (r)  =  — r  <  0,  (27) 

P2  (r)  =  -(nx-  2 r)  +  7min  <  0, 

where  /Jjans  is  the  normalization  of  the  objective  function 
defined  as 

(fix  2r )  if  ( Tlx  2r )  >  7min 

0  otherwise, 

considering  the  accuracy  of  the  JoM  estimator. 

In  this  paper,  the  weights  of  the  conflicting  objectives  are 
determined  as  linear  predictions  from  autoregressive  (AR) 
models.  The  next  section  presents  details  about  this  process. 
However,  the  weights  can  also  be  chosen  depending  on  the 
application  and  preference  of  decision  maker(s)  [19],  [20]. 

The  nonlinear  convex  optimization  problem  in  (27)  can  be 
solved  using  any  standard  nonlinear  optimization  technique 
[19].  In  addition,  the  solution  is  strictly  Pareto  optimal  for  the 
positive  weights  of  the  convex  objective  functions  [20],  [21]  . 
In  this  paper,  the  sequential  quadratic  programming  (SQP)  is 
employed  to  find  a  Pareto-optimal  solution  to  (27).  The  SQP 
iteratively  solves  a  quadratic  approximation  to  the  Lagrangian 
function,  in  the  sense  that  the  sequence  of  solutions  approaches 
to  optimal  solution  satisfying  the  necessary  Karush-Kuhn- 
Tucker  (KKT)  conditions  [34],  [35].  Note  that  there  are  many 
other  ways  to  solve  the  above  multi-objective  optimization 
problem.  The  contribution  of  this  paper  is  not  in  optimization, 
but  in  multitarget  detection  and  state  estimation.  Thus,  we 
have  used  a  standard  optimization  approach  that  guarantees 
a  Pareto-optimal  solution  without  exhaustive  comparison  with 
other  approaches. 

In  order  to  illustrate  the  geometrical  interpretation  of  the 
weighted  sum  method,  let  us  examine  the  nonlinear  convex 
optimization  problem  in  (27)  with  the  following  parameters: 
P  =  diag  ([50, 50, 10, 10]/),  nx  =  4  and  7min  =  0.297 
with  the  confidence  probability  of  99.9%.  Considering  the 
inequality  constraints  in  (27)  the  feasible  design  space  of 
r,  i.e.,  T  =  {r  |pi  (r)  <  0,  *  =  1,  2}  is  obtained  as  T  = 
[0,  1.8515]  [20],  [22].  Thus,  the  feasible  criterion  space  of 
the  vector  of  the  normalized  objective  functions,  i.e.,  F  = 
[fTrans  ^  )  jTrans  (T)]  is  defined  as  fl  =  {F  |r  g  T}  [20], 
[22].  Fig.  2  shows  the  relationship  between  the  Pareto  front 
and  the  normalized  objective  functions  in  the  feasible  criterion 
space.  The  Pareto  front  is  the  set  of  the  non-dominated  points, 
i.e.,  Pareto-optimal  points  in  the  criterion  space  [20],  As  can 
be  seen  in  Fig.  2,  the  Pareto  front  is  a  convex  curve.  Thus, 
a  Pareto-optimal  point  can  always  be  obtained  depending  on 
the  weights  of  the  conflicting  objective  functions  [22],  [36]. 
This  is  because  for  a  given  set  of  weights,  the  weighted  sum 
method  approximates  the  Pareto  front  as  a  line  [36]: 


conflicting  objective  functions  are  considered  equally  impor¬ 
tant,  i.e.,  wi  =  wj  =  0.5.  Thus,  the  Pareto-optimal  point  in 
the  feasible  design  space  is  computed  as  F  =  [0.1747, 0.1687]. 
As  expected,  the  normalized  objective  functions  in  conflict  are 
penalized  almost  equally.  In  Fig.  2,  the  line  with  the  slope  —  1 
is  tangent  to  the  Pareto  front  at  F  =  [0.1747,0.1687]  and 
locally  approximates  the  convex  Pareto  front. 


Fig.  2.  Geometrical  interpretation  of  the  weighted  sum  method  in  the  feasible 
criterion  space. 


V.  Linear  predictions  of  objective  weights 

AR  models  predict  the  current  output  of  a  stochastic  process 
based  on  its  previous  outputs.  The  AR  model  of  order  N, 
denoted  as  AR  (N),  is  in  general  defined  by  [37] 

En 

CtiXk-i  +  Vk, 

1=1 

where  c  denotes  a  constant  for  a  non-zero  mean  value  of  Xk, 
{ai}i-i  aie  predictor  coefficients  and  is  a  white  noise 
representing  prediction  error  with  zero  mean  and  variance  <j|. 
For  linear  predictions  of  the  objective  weights,  we  use  the 
following  AR  (1)  model: 

wk  =  c  +  awk-i  +  #fc,  (28) 

where  the  predictor  coefficient  indicates  linear  relationship  in 
this  time  series.  For  a  wide  sense  stationary  (WSS)  process,  the 
condition  |a|  <  1  must  be  satisfied.  In  this  case,  the  Af?(l) 
model  is  statistically  characterized  by  [37] 


rTrans  /'_\  xTrans  /  \  ,  ^  r 

To, I  \T)  = - Jo,J  (T)  H - Jm  {T  )  , 

wi  ’  wi 

where  t*  denotes  a  Pareto-optimal  solution.  For  example,  the 
SQP  finds  the  Pareto-optimal  solution  as  r*  =  1.1674  if  the 


E  [wfe]  =  /iw  =  - - , 

1  —  a 

2 

/  \  2 

var  (Wk)  =  crw  =  - - 7 

1  —  cr 

CO  v(wk,wk_i)  =  al,a\ 


Thus,  the  autocorrelation  function  between  wk  and  wk-i 
decays  to  zero  by  a1  as  i  — >  oo.  This  means  that  the  AR  (1) 
model  is  also  stable,  i.e.,  represents  a  predictable  process. 

The  objective  function  fa  j  (r)  in  (27)  only  considers  the 
degree  of  freedom,  i.e.,  nx  because  of  the  definition  of  the 
hyperellipsoid  in  (21).  Thus,  substituting  (24)  into  (26)  for  a 
Bernoulli  target  with  parameter  pair  {qk,  fk}  over  the  volume 
£\  results  in 


/  fk  (M)  dx  =  qkfk  (x)  e\, 
J 


=  qk 


2n"/2r(^) 


(nx  -  2 r)n*/2, 


where  fk  ( x )  is  a  Gaussian  pdf  and  T  (•)  denotes  the  gamma 
function.  Notice  that  the  approximation  is  independent  of  P 
at  time  k,  denoted  as  If.  To  consider  the  covariance  of  fk  ( x ) 
implicitly  in  this  approximation  we  determine  the  degree  of 
correlation  between  Wjj:  and  Wj^-i  as 


A 


|A-il1/2 

IAI1'2 


1.4  (ih) 


where  the  first  term  is  the  ratio  of  infinitesimal  volumes  to 
locate  a  Bernoulli  target  with  the  same  spatial  probability  at 
time  k  and  k  —  1,  respectively  and  1  a  denotes  an  indicator 
function  defined  on  the  set  A  =  [gmin,  1]  [2].  The  indicator 
function  neglects  changes  in  Pk  before  confirming  a  Bernoulli 
target  with  the  threshold  qmin.  Thus,  we  keep  the  weights 
at  their  initial  states  until  a  probable  Bernoulli  target  is 
confirmed.  In  addition,  for  a  stable  process  the  correlation 
must  decay  to  zero  as  time  lag  increases.  For  this  purpose, 
we  set  a  =  Pk  in  (28)  within  its  control  limits  as  shown  in 
Fig.  3. 


Fig.  3.  Predictor  coefficient  of  AR(1)  model  versus  the  degree  of  correlation 
between  successive  weights. 


At  this  point,  it  is  important  to  note  that  our  AR  (1) 
model  with  the  predictor  coefficient  evolving  in  time  does  not 
represent  a  WSS  process.  However,  it  would  turn  into  a  WSS 
process  after  the  optimal  JoTT  filter  converges  to  its  steady- 
state  with  detections.  Then,  the  predictor  coefficient  is  set  to 
a  =  0.9  according  to  Fig.  3  since  successive  changes  in  Pk 
would  be  small.  Thus,  the  linear  predictions  monotonically 
approach  to  p,wj  =  10cj,  where  0.1  <  pw.j  <  0.9  in  order 
to  prevent  that  one  objective  completely  dominates  another  in 
the  multi-objective  optimization.  Since  fk  (x)  is  very  peaky 
after  the  convergence,  faj  (r)  becomes  more  important  than 
foj  (t)  in  (27).  Hence,  pw.j  is  set  to  its  maximum  value,  i.e., 
fiw,j  =  0.9  by  cj  =  0.09. 

Using  witk  +  wjtk  =  1,  the  the  Af?(l)  model  for  wj,k  is 
defined  by 

wi,k  =  0.01  +  atuj,fe_i  +  uk, 

where  vk  is  a  white  noise  with  zero  mean  and  variance  of  j  = 
aw  j  since  vi,k  =  — Similarly,  after  the  convergence  its 
linear  predictions  monotonically  approach  to  /iwj  =  0.1. 

On  the  other  hand,  the  optimal  JoTT  filter  gradually  dete¬ 
riorates  after  target  death.  Therefore,  Pk  takes  values  close 
to  zero  and  with  a  =  0.1  the  linear  predictions  for  wj ^ 
and  Wj:k  monotonically  approach  to  their  opposite  means, 
i.e.,  =  0.1  and  fiwj  =  0.9,  respectively.  Consequently, 

fo,i  (t)  becomes  more  important  than  f0j  (r)  in  (27)  as 
fk  (x)  disperses  over 


VI.  Implementation  of  the  JoM  Estimator  for  the 
JoTT  Filter 


Suppose  that  at  most  one  target  is  present.  In  this  case,  the 
RFS  of  a  single  target  can  be  modeled  as  a  Bernoulli  RFS 
with  the  parameter  pair  (qk- i,  fk- 1).  Thus,  its  FISST  density 
is  parameterized  as 


fk- 1  (X) 


1  —  qk-i  ifX  =  0 
qk-ifk-i  (x)  if  X  =  {ir}  , 


(29) 


where  qk-i  is  the  existence  probability  of  the  target,  and 
fk- i  (x)  is  its  spatial  pdf  if  the  target  is  present. 

In  the  prediction  step  of  the  JoTT  filter,  the  FISST  density 
fk~  i  (X)  propagated  to  time  k  is  parameterized  as  follows 
[2],  [13]: 


qk\k-i  =Pb  (1  -  qk-i)  +  Qk-i  J PS,k-i  (x)  fk-i  (x)dxk-i, 


fk\k  —1  (*t)  — 


1 


Qk\k-1 


(30) 

[(1  -  qk-i)pBbk  (x)  +  qk- 1  (f,Ps^)\ , 

(31) 

where  a  newborn  target  is  declared  with  probability  ps  ac¬ 
cording  to  a  birth  density  bk  (x),  i.e.,  the  Bernoulli  parameter 
pair  (pB,bk),  and 

(f,Psip)  =  J  fk- 1  (x)ps,k-i  (x)ifk\k-i  (•  \x)dxk~i, 


where  ps,k- 1  (x)  is  the  state-dependent  target  survival  proba¬ 
bility  and  if  the  target  survives,  its  states  evolve  according  to 
the  Markov  state  transition  density  ipk\k-i  (•  |x). 
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Suppose  that  the  single-sensor  multitarget  measurements  at 
time  k  are  modeled  as 


Zfe  =  rfc  (a:)UCk, 


where  C k  is  the  RFS  of  i.i.d.  false  alarms  and  Yk  (x)  is 
the  Bernoulli  RFS  of  target-originated  measurement  with  the 
parameter  pair  (pp(x),  gk{z\x)),  where  pd{x)  is  the  detection 
probability,  and  gk(z \x)  is  the  measurement  likelihood  func¬ 
tion. 

In  the  original  derivation  of  the  JoTT  filter,  the  false 
alarm  process  is  modeled  as  an  arbitrary  RFS.  If  the  Poisson 
false  alarm  RFS  with  mean  rate  Ac  and  spatial  pdf  c  (z)  is 
substituted  for  the  arbitrary  false  alarm  RFS,  the  original  data 
update  equations  of  the  JoTT  filter  defined  in  [2],  [13]  have 
the  form  of 


i  -  fk\k~i  M  +  E  A|fc-1ST(Z|')1 


Qk\k 


z£Zk 


1k\k-l  ~  fk\k-l  [i PD ]  +  E 

z£Zk 


If) 

fk\k-i  .Pi>Uk(z\; )] 
k(z) 


(32) 


fk\k  (*^) 


l  -PD  ( X)+PD  (x)  E 

_ z<£Zk 

i  -  /fe|fe-i  [pd }  +  e  /fc|fc-1Lp(T(z|,)1 

zezk 


fk\k  —  1 


(x) . 
(33) 


where,  in  general,  fk |fe_x  [x]  =  f  x  fk \k_1  {x)dx  and  k(z)  = 
Xc  c  (z)  is  the  intensity  function  of  the  Poisson  false  alarm 


RFS. 


For  the  JoM  estimator,  the  Bayesian  risk  function  to  be 
minimized  is  given  by  [27] 


c  (X,  J  (Z))  f  (X)  SX  «  2  -  plxl  (I  J|)  -  f(X^J\ 

(34) 

where  J  denotes  the  JoM  estimator,  C  is  the  cost  function 
that  penalizes  both  discrepancies  in  cardinality  and  multitarget 
states,  and  p\x\  (|  J|)  is  the  cardinality  distribution  evaluated 
at  the  target  number  |J|. 

Then,  using  the  updated  Bernoulli  parameters  from  the  JoTT 
filter,  the  JoM  estimator  confirms  the  presence  of  a  single 
target  if 


2  —  (1  —  qk |fc )  >  2  -  qk\k  -  qk\k  fk\k  (x)  e,  (35) 

where  the  left  hand  side  is  the  Bayes  risk  function  evaluated 
for  the  “no-target”  case,  i.e.,  X  =  0  and  the  right  hand  side 
is  the  Bayes  risk  function  evaluated  for  the  “target-present” 
case,  i.e.,  X  =  {x}.  Solving  this  inequality  for  qk\k  yields  the 
following  test  for  “target-present”  decision: 

Qk\ k  >  7Z——f - JZZ--  (36) 

2  +  fk\k  (x)  £ 

As  in  the  original  JoM  estimator,  first,  the  MAP  estimate  of 
X  =  {x}  is  computed  from  the  parameterized  FISST  density, 
i.e.,  (qk\k ,  fk\k)  where  the  spatial  pdf  fk\k  has  the  Gaussian 
mixture  form,  i.e.,  /fc|fc  (x)  =  wk\kfk\l-  with  the  mixing 

weights  satisfying  that  E£l  wk\k  ~  1-0.  Before  state  estima¬ 
tion,  pruning  and  merging  of  the  Gaussian  components  are 
performed.  Thus,  the  state  estimation  is  obtained  using  the 
well-separated  and  significant  Gaussian  density  components 


according  to  (17).  For  the  selected  Gaussian  density  compo¬ 
nent,  its  Pareto-optimal  volume  given  by  Tpj0pt  =  qk\k£P,opt 
is  computed.  Then,  the  test  for  “target-present”  decision  in 
(36)  is  checked  using  £p,opt-  That  is,  fk\k  (x)  £pt0pt.  is  set  to 
min  (fk\k  C x)£p,opt,l/qk\k )•  Consequently,  if  target  is  pro¬ 
gressively  better-localized,  all  of  its  probability  mass  would 
be  almost  located  in  £p,opU  i.e.,  qk\k  Sk\k  (x)£p,opt  «  1  [2]. 


VII.  Simulation  Results 


In  this  section,  the  proposed  JoM  estimator  is  compared 
with  the  MaM  estimator.  To  do  this,  their  track  management 
performance  using  outputs  of  the  JoTT  filter  is  evaluated 
through  the  OSPA  metric  [17],  [18].  The  OSPA  metric  com¬ 
pares  two  finite  sets  X,  and  Y,  considering  the  difference 
in  their  cardinalities  (i.e.,  cardinality  error)  and  the  positional 
distance  between  their  associated  points  (i.e.,  localization 
error)  after  an  optimal  assignment.  The  sensitivity  of  the 
OSPA  metric  to  these  two  errors  are  controlled  by  the  cut¬ 
off  parameter  c  and  the  order  parameter  p.  However,  for  a 
Bernoulli  RFS  the  OSPA  metric  reduces  to  [38] 


4C)  (X,  Y) 


0  if  26  =  0,  Y  =  0 

c  if  X  =  d),  Y  =  {y} 

c  if  X  =  {x}  ,  Y  =  0 

d(c)  {x,  y)  if  X  =  {x},  Y  =  {y}  , 


where  dX*  (x,y)  =  min  (c,  d  (x,  y))  is  the  cut-off  distance 
between  the  points  in  two  non-empty  Bernoulli  RFSs.  Thus, 
in  this  case,  the  OSPA  metric  is  independent  of  the  order 
parameter  p.  In  addition,  the  major  performance  difference 
between  the  two  estimators  is  expected  to  occur  in  the  accuracy 
of  their  decisions  on  track  confirmation,  track  maintenance, 
and  track  termination.  Then,  the  cut-off  parameter  c  must  be 
set  to  a  high  value  in  order  to  make  the  OSPA  metric  sensitive 
to  cardinality  errors  due  to  false  and  missing  point  estimates. 
In  simulations,  the  OSPA  metric  is  therefore  computed  with 
the  parameters  p  =  1,  and  c  =  25. 

The  target  state  vector  comprises  position  and  velocities  in 
x  -  y  directions,  i.e.,  xk  =  \pXlk,Py,k,  vx,k,  vy,k]'-  If  the  target 
does  survive  with  probability  ps  =  0.90,  its  states  evolve 
according  to  the  coordinated  turn  model  with  the  known  turn 
rate  12  [33],  [39],  i.e.,  the  state  transition  model  is 


xk  —  F  (12)  xk—\  T  Gujk—\} 


where  cok_ i  ~  N  (0,  Qk-\)  is  the  zero-mean  Gaussian  process 
noise  with  covariance  matrix  Qk-\  =  diag  ([0.1, 0.1]')  nr/s2, 
and  the  system  matrices  are 


F(n) 


i 

0 

0 

0 


^  sin(f2T) 

U  U 

1  1— cos(QT) 

1  U 


0  cos  (12T) 
0  sin  (12T) 


T 


0 


1— cos(QT) 

u 

sin(QT) 

n 

—  sin  (f2T) 
cos  (f2T) 

0 

ll 

2 

5 

0 

T 
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where  T  is  the  sampling  interval  and  set  at  T  =  Is  in 
simulations. 

The  single  target  tracking  scenario  runs  for  40s.  The  target 
appears  at  time  k  =  6  and  moves  along  a  straight  line  with 
a  constant  speed  of  |u|  =  5m/s  in  the  x  —  y  directions  until 
time  k  =  20.  Then,  it  starts  maneuvering  at  a  constant  turn 
rate  of  Q  =  2  deg/s  and  is  terminated  at  time  k  =  35. 
The  target  birth  is  modeled  as  a  Bernoulli  RFS  given  by 
{qb,  fb  (at)},  where  the  birth  existence  probability  is  set  at 
qb  =  0.01,  and  the  spatial  pdf  is  defined  as  fb  ( x )  =  N  (ib,  Pb) 
with  mean  otb  =  [ — 70,  70, 0,  0]/  and  covariance  matrix  Pb  = 
diag  ([50,50,10,10]'). 

The  target  is  detected  by  a  sensor  with  state-independent 
detection  probability  pi)  and  the  sensor  has  a  linear  Gaussian 
measurement  model  given  by 

zk  =  Hxk  +  r]k , 

where  r]k  ~  N(0,Rk)  is  the  zero-mean  Gaussian  measure¬ 
ment  noise  with  covariance  matrix  Rk  =  diag  ([1, 1]  )  m. 
With  /2x  2  and  0-2x2  denoting  the  n  x  n  identity  and  zero 
matrices,  respectively,  the  observation  matrix  is  given  by 
H  =  \ f‘2 x 2 •  02x2]-  In  addition  to  noisy  target-originated 
measurement,  the  received  measurement  set  includes  clutter 
points.  In  simulations,  clutter  is  modeled  as  a  Poisson  RFS 
with  the  mean  rate  of  Ac  =  10  per  scan  and  uniform  spatial  dis¬ 
tribution  over  the  surveillance  region  V  =  [—300m,  300m]  x 
[—300m,  300m],  i.e.,  c(z)  =  V _1.  The  performance  of  the 
two  estimators  is  evaluated  by  running  the  same  scenario 
for  500  Monte  Carlo  runs.  In  each  trial,  target-originated 
measurement,  detected  with  p  d  ,  and  independent  random 
clutters  are  generated.  Fig.  4  shows  the  x  and  y  components  of 
the  target  trajectory,  measurements  and  the  position  estimates 
obtained  from  the  JoTT  filter  with  p n  =  0.80  for  one  Monte 
Carlo  trial. 

In  the  JoTT  filter,  the  Bernoulli  RFS  is  represented  as 
a  Gaussian  mixture.  The  maximum  number  of  Gaussian 
components  is  set  at  Jmax  =  100.  They  are  pruned  and 
merged  at  each  time  step  with  thresholds  Tprune  =  10” 3 
and  Tmerge  =  4.0,  respectively  according  to  the  algorithm 
proposed  in  [40]. 

The  track  management  performance  of  the  proposed  JoM 
estimator  and  the  MaM  estimator  are  shown  in  Fig.  5-7  for  dif¬ 
ferent  values  of  the  detection  probability,  ranging  from  high  to 
moderately  small  values,  i.e.,  pn  =  0.95,  0.90, ....,  0.70.  The 
MaM  estimator  confirms  “target-present”  decision  by  com¬ 
paring  the  existence  probability  qk\k  with  the  hard  threshold 
0.5.  However,  the  proposed  JoM  estimator  confirms  “target- 
present”  decision  by  setting  a  lower  margin  than  this  hard 
threshold  considering  how  well  the  JoTT  filter  localizes  the 
target,  i.e.,  the  term  fk\k  ( x )  e  in  (36).  However,  the  maximum 
value  of  f\-\k  (x)e  is  set  by  a  confirmation  threshold  qmin.  In 
simulations,  qmin  is  set  to  0.20.  Thus,  the  track,  for  which 
Qk\k  >  Qmin ,  is  confirmed  by  the  JoM  estimator.  In  particular, 
the  use  of  this  threshold  helps  to  prevent  false  point  estimates 
before  the  target  birth  and  after  the  target  death. 

In  Fig.  5,  it  can  be  seen  that  the  two  estimators  demonstrate 
almost  the  same  track  management  performance  in  terms  of 
track  confirmation  before  the  target  birth  at  time  k  =  6.  In 


Fig.  4.  x  and  y  components  of  target  trajectory,  measurements  and  JoTT  filter 
estimates. 


Fig.  5.  500  Monte  Carlo  run  averages  of  the  OSPA  metric  computed  for  the 
track  management  performance  of  the  JoM  and  MaM  estimators. 

addition,  the  initial  track  maintenance  quality  of  the  proposed 
JoM  estimator  with  insignificant  values  of  the  lower  margin  is 
nearly  the  same  as  that  of  the  MaM  estimator.  However,  the 
JoTT  filter  localizes  the  target  more  accurately  using  target- 
originated  measurements  detected  with  high  probability  as 
time  proceeds.  Therefore,  the  lower  margin  than  the  hard 
threshold  0.5  becomes  significant,  so  that  the  proposed  JoM 
estimator  does  not  prematurely  declare  track  termination  if 
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the  target  is  miss-detected  due  to  sensor  imperfection.  On  the 
other  hand,  large  values  of  the  lower  margin  than  the  hard 
threshold  0.5  result  in  latency  on  track  termination.  That  is, 
after  the  target  is  terminated  at  time  k  =  35,  the  localization 
performance  of  the  JoTT  filter  does  deteriorate  gradually  due 
to  missed  detections.  Hence,  the  track  termination  decision  is 
delayed  in  the  proposed  JoM  estimator. 


Time  Index  Time  index 


Fig.  6.  500  Monte  Carlo  run  averages  of  the  OSPA  metric  computed  for  the 
track  management  performance  of  the  JoM  and  MaM  estimators. 

In  Fig.  6(a),  it  can  be  seen  that  the  track  management 
performances  of  the  two  estimators  are  nearly  the  same 
during  the  tracking  scenario.  These  results  indicates  that  the 
decrease  in  the  existence  probability  of  target  (qk ifc)  cannot  be 
compensated  by  the  value  of  the  lower  margin  computed  in 
the  proposed  JoM  estimator  when  the  target  is  miss-detected. 
However,  Fig.  6(b)  shows  that  the  track  maintenance  quality 
of  the  proposed  JoM  estimator  is  better  than  that  of  the  MaM 
estimator  after  the  target  birth.  That  is,  the  value  of  the  lower 
margin  can  compensate  the  decrease  in  qk\k  due  to  target 
being  miss-detected.  Nevertheless,  the  proposed  JoM  estimator 
suffers  from  track  termination  latency  more  than  the  MaM 
estimator  due  to  the  statistics  indicating  a  well-localized  target 
obtained  from  the  JoTT  filter  after  time  k  =  35. 

Finally,  Fig.  7  shows  the  track  management  performances 
of  the  two  estimators  under  moderately  small  detection  prob¬ 
abilities.  It  can  be  seen  that  the  initial  track  management  per¬ 
formance  of  the  proposed  JoM  estimator  is  better  than  that  of 
the  MaM  estimator.  More  explicitly,  the  MaM  estimator  suffers 
much  more  from  the  track  confirmation  latency  using  the  hard 
threshold  0.5  than  the  JoM  estimator  with  insignificant  values 
of  the  lower  margin.  In  addition,  the  track  maintenance  quality 
of  the  proposed  JoM  estimator  is  better  than  that  of  the  MaM 
estimator  after  a  small  period  of  time  from  the  target  birth. 
However,  as  in  Fig.  6(b),  the  proposed  JoM  estimator  confirms 


track  termination  with  larger  time  delay  after  time  k  =  35, 
compared  to  the  MaM  estimator. 


Fig.  7.  500  Monte  Carlo  run  averages  of  the  OSPA  metric  computed  for  the 
track  management  performance  of  the  JoM  and  MaM  estimators. 

According  to  the  AR  (1)  models  in  Section  V,  time  evo¬ 
lution  of  the  weights  in  (27)  for  different  values  of  detection 
probability  is  shown  in  Fig.  8  and  Fig.  9.  For  considerably 
high  detection  probabilities,  e.g.,  pjj  =  0.95  and  pn  =  0.90, 
the  weights  are  adjusted  as  indicated  in  Section  V,  i.e.,  they 
monotonically  approach  to  their  means  after  the  optimal  JoTT 
filter  converges  to  its  steady-state  with  detections.  However,  if 
the  detection  probability  is  not  so  high  or  close  to  moderately 
small  values,  the  weights  are  predicted  based  on  the  estimation 
error  analysis  in  the  optimal  JoTT  filter.  Consequently,  the 
linear  predictions  can  be  considered  to  be  adaptive  to  the  JoTT 
filter’s  performance. 

VIII.  Conclusions 

In  this  paper,  we  have  proposed  an  optimization  algorithm  to 
compute  the  optimal  value  of  the  unknown  estimation  constant 
in  the  JoM  estimator.  The  optimization  problem  is  defined  in 
terms  of  two  conflicting  objective  functions.  The  first  objective 
function  is  defined  in  terms  of  the  information  theoretic  sense 
and  aims  for  entropy  maximization  by  setting  the  estimation 
constant  to  its  maximum  permissible  value.  In  contrast,  the 
second  one  arises  from  the  constraint  in  the  definition  of  the 
JoM  estimator  and  aims  to  improve  the  accuracy  of  the  JoM 
estimates  by  setting  the  estimation  constant  to  its  minimum 
value  determined  by  the  probability  of  user’s  confidence  level. 
We  used  a  standard  optimization  approach  that  guarantees  a 
Pareto-optimal  solution. 

The  proposed  JoM  estimator  is  used  in  the  JoTT  filter  and 
compared  to  the  other  MAP  type  multitarget  estimator-called 
the  MaM  estimator.  The  simulation  results  demonstrate  that  the 
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Fig.  8.  500  Monte  Carlo  run  averages  of  the  weights  for  high  detection 
probabilities. 
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Fig.  9.  500  Monte  Carlo  run  averages  of  the  weights  for  moderately  small 
detection  probabilities. 


track  management  performance  of  the  proposed  JoM  estimator 
in  terms  of  track  confirmation  latency,  and  track  maintenance 
quality  after  target  birth  is  better  than  that  of  the  MaM  esti¬ 
mator  for  different  values  of  the  detection  probability,  ranging 
from  high  to  moderately  small  values.  However,  the  proposed 
JoM  estimator  suffers  from  track  termination  latency  more 
than  the  MaM  estimator  as  the  localization  performance  of  the 
JoTT  filter  does  deteriorate  gradually  after  target  termination. 


Appendix  A 


To  understand  why  selection  of  too  small  values  for  the 
JoM  estimation  constant  does  not  ameliorate  multitarget  state 
estimates,  quantize  the  FISST  density  /  ({aq, ...,  xn})  for  all 
n  into  small  and  disjoint  hyperspaces  A"  with  volume  en. 
Then  using  the  relation  /  ({aq, ...,  xn})  =  n\f  (aq, ...,  xn)  the 
probability  over  a  small  hyperspace  indexed  by  variable  i,  i.e., 
A”  [2]  is  computed  as: 


Pi  (n) 


1 

n\ 


f  {{xx,  ...,xn})  dxi-.dx, 


f  (aq,  ■■■,xn)  dxx.-.dx 
X  i,...,xni)en. 


n? 


(A.l) 


where  XXi,—,xn.  denotes  the  multitarget  state  estimates 
obtained  from  /  ({aq, ...,  xn})  in  A".  Note  that  if 
f  {{x i, ... ,xn })  is  peaky  over  A",  £  must  be  set  to  a  small 
value  to  satisfy  the  following  condition: 


^2  pi  (n)  - 1- 
n—0  i:AV'£Xn 


Thus,  similar  to  the  quantization  of  a  continuous  random 
variable,  the  entropy  of  the  quantized  RFS  is  defined  as 

OO 

H(xA)=~Y^  X!  Pi  (n)  log  (pi(n)).  (A. 3) 

n— 0  i: 


Upon  substitution  of  Pi(n)  =  f  (aqif  ...,xni)  en  into  the 
logarithmic  function  in  (A. 3),  the  entropy  of  the  quantized 
RFS  can  be  rewritten  as 

OO 

H  (X 'A)=-Y1  Pi{n)l°g{f{x li,...,x„i)en), 

n—0  i: Ap'GA’71 
oo 

=  ~^2  Pi(n)\og(f  (xxz,...,xni))- 

n—0  i:A™€Xri 
oo 

Pi(n)l°  g(e”)> 

n—0  i:A 

(A.4) 

where  the  first  term  is  the  average  self-information  of  the 
joint  symmetric  pdfs  (i.e.,  /  (xx, xn))  over  A(n)  and  the 
second  term  is  the  average  self-information  of  the  uniform 
pdfs  (i.e., U  (xx,  xn)  =  e~n )  over  A(n\ 

For  simplicity  of  analysis,  assume  that  /  (xxit  ■■■,  xn, )  ~  1.0 
over  some  hyperspaces  A”  indexed  by  i*.  For  the  rest, 
/  (xii, xHi)  w  0  and  thus  from  (A.l)  the  probability  over 
those  regions  is  px  (n)  ~  0.  In  this  case,  using  the  convention 
0  log  0  =  0  and  log  1  =  0,  the  first  term  in  (A.4)  is  canceled 
and  the  entropy  of  the  quantized  RFS  simplifies  to 


J2  Pi*  W  log  (en),  (A.5) 

n=0  i*:  A?*  G  A’71- 


Note  that  e  is  small  enough  to  satisfy  the  condition  given  by 
(A. 2).  Similar  to  typical  sequences  with  equal  probabilities  in 
a  typical  set,  most  of  the  total  probability  is  almost  equally 
divided  on  some  hyperspaces  A"»  indexed  by  i*.  Therefore, 
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selecting  too  small  values  for  e  will  not  ameliorate  the 
accuracy  in  multitarget  state  estimates.  On  the  contrary,  the 
entropy  will  get  larger  values  due  to  the  uncertainty  regarding 
what  multitarget  state  estimate  is  true. 

Appendix  B 

For  the  standard  Gaussian  density  /  (x)  defined  in  R"  x ,  it 
follows  from  (3)  that  the  amount  of  self-information  associated 
with  the  outcome  (xi,  ...,xn)  £  ATn  is 

H(f)  -t  <  --log  f(xi,...,xn)  <  H  (/)  +  r,  (B.l) 
n 

where  are  i.i.d.  samples  from  /  (x),  and  H  (/)  = 

0.5  log  (27re)”x  [23]. 

Substituting  for  —  log  /  (xi, ...,  xn)  =  0.5nlog  (27r)nx  + 
0.5J2™=1xfxi  into  (B.l)  and  making  some  algebraic  manip¬ 
ulations  yield 

n(nx  —  2r)  <  xf  x*  <  n  (nx  +  2r) ,  (B.2) 

where  Y2d=t  xfxi  represents  a  thin  shell  around  a  hypersphere 
centered  at  the  origin  of  R"x  as  claimed. 

Appendix  C 


(C.5)  is  called  complementary  slackness  conditions.  Thus,  the 
last  KKT  condition  verifies  that  r*  is  the  global  minimum 
point  of  L  (t,  A*). 

Based  on  the  KKT  conditions  three  possible  cases  are 
distinguished  for  optimality  of  t*  ,  and  A*  =  (AT,  AJ): 

1)  The  constraints  are  both  inactive:  this  means  that  A*  = 
0,  for  i  =  1,2.  Then,  the  optimal  value  of  the  primal 
variable  is  set  to  r*  =  0  to  satisfy  the  last  KKT  condition 
as 

TL 

VtL  (t*,  A*)  = - ^—-1  =  0.  (C.7) 

nx  -  2r* 

2)  The  constraints  are  both  active:  this  means  that  A*  > 
0  for  i  =  1,2.  Then,  the  complementary  slackness 
conditions  contradicts  for  optimality  of  r.  That  is,  (C.5) 
for  i  =  1  requires  that  r*  =  0,  whereas  (C.5)  for  i  =  2 
requires  that  t*  =  0.5  (nx  —  7m;n)  where  7mjn  <C  nx. 
Nevertheless,  the  optimal  value  of  the  primal  variable 
becomes  r*  =  0  if  the  probability  of  confidence  is 
excessively  set  to  7m;n  =  nx.  Thus,  the  last  KKT 
condition  will  have  the  form 


VrL  (r*,  A*)  = 


nx  —  2  t* 


-  1  -  AT  +  2A5 


=  -AT+2A$, 


The  nonlinear  convex  optimization  problem  in  (25)  is  re¬ 
ferred  to  as  the  primal  problem  [31].  The  Lagrangian  of  the 
primal  problem  is  written  as 

L  (r,  A)  =  f0j  (r)  +  X1g1  (r)  +  A 2g  (r) ,  (C.l) 

where  r  and  A  =  (Ai,  A2)  are  called  primal  and  dual  variables, 
respectively. 

According  to  the  duality  theorem,  the  dual  problem  has 
the  same  optimal  solution  with  the  primal  problem  if  Slater’s 
condition  holds  [35].  Associated  with  the  primal  problem,  the 
dual  function  is  defined  as 

g(  A)  =  min  L(r,X), 

T  (C.2) 

=  L(t*,  A), 

where  r*  is  the  primal  solution  and  the  dual  solutions  to  g  (A), 
i.e..  A*  =  (AJ,  A|)  are  the  Lagrange  multipliers  of  the  primal 
problem. 

For  any  convex  optimization  problem  with  differentiable 
objective  and  constraint  functions,  the  necessary  and  sufficient 
conditions  to  analyze  the  optimality  of  t*  ,  and  A*  =  (AT,  AJ), 
are  called  the  Karush-Kuhn-Tucker  (KKT)  conditions  [31], 
[35],  That  is,  r*,  and  A*  =  (AT,  A?])  must  satisfy  the  following 
conditions 

gt  (t*)  <  0,  fort  =  1,2  (C.3) 

A*  >  0,  fort  =  1,2  (C.4) 

A*ffi  (t*)  =  0,  fort  =  1,2  (C.5) 

and 

VtL  (t*,  A*)  =  VTf0j  (r*)  +  A*Vr  9l  (r*)  =  0, 

(C.6) 

where  (C.3)  is  called  primal  feasibility  conditions  of  t* ,  (C.4) 
is  called  the  dual  feasibility  conditions  of  A*  =  (AT,A?;),  and 


in  which  case,  Aj  =  2A^.  That  is,  the  inequality 
constraint  gi  (r)  turns  into  gi  (r)  :  r  <  0.  Then,  the 
constraints  gi  (r)  and  g-2  (r)  contradict  each  other  unless 
they  both  turn  into  the  equality  constraint  given  by 
r  =  0. 

3)  One  active  and  one  inactive  constraint:  this  means  that 
either  A[  >  0  and  A^  =  0  or  AT  =  0  and  A?;  >  0.  If 
A]  >  0  and  A)  =  0,  then  the  complementary  slackness 
condition  for  i  =  1  requires  that  r*  =  0  but  the 
last  KKT  condition  cannot  be  satisfied  for  r*  =  0. 
On  the  other  hand,  if  A]  =  0  and  A?S  >  0,  then  the 
complementary  slackness  condition  for  i  =  2  requires 
that  r*  =  0.5  (nx  —  7min)  and  again,  the  last  KKT 
condition  cannot  be  satisfied  for  AjT  >  0. 

Consequently,  the  inequality  constraints  for  the  nonlinear  con¬ 
vex  problem  are  both  inactive  unless  nx  =  7min-  In  addition, 
t*  =  0  is  the  optimal  solution  for  the  primal  problem.  That 
is,  the  convex  objective  function  fQj  (r)  given  by  (25)  has  a 
global  minimum  at  nx  —  2 r*  =  nx.  Note  that  the  inequality 
constraints  gi{r ),  fori  =  1,2  are  affine  in  addition  to  the 
convexity  of  fQj  (r),  then  Slater  condition  for  the  strong 
duality  holds.  Therefore,  the  strong  duality  indicates  that  the 
optimal  solution  to  the  primal  problem  faj  (r)  can  be  attained 
from  the  dual  problem  [31]. 
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